NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nyquist-resolving gravitational waves via orbital frequency-based refinement

https://doi.org/10.1103/PhysRevD.111.124001

Black, William K; Neilsen, David; Hirschmann, Eric W; Van_Komen, David F; Fernando, Milinda (June 2025, Physical Review D)

Full Text Available
Massively parallel simulations of binary black holes with adaptive wavelet multiresolution

https://doi.org/10.1103/PhysRevD.107.064035

Fernando, Milinda; Neilsen, David; Zlochower, Yosef; Hirschmann, Eric W.; Sundar, Hari (March 2023, Physical Review D)

Full Text Available
Scalable Local Timestepping on Octree Grids

https://doi.org/10.1137/20M136013X

Fernando, Milinda; Sundar, Hari (April 2022, SIAM Journal on Scientific Computing)

Full Text Available
A GPU-Accelerated AMR Solver for Gravitational Wave Propagation

https://doi.org/10.1109/SC41404.2022.00080

Fernando, Milinda; Neilsen, David; Hirschmann, Eric; Zlochower, Yosef; Sundar, Hari; Ghattas, Omar; Biros, George (November 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis)

Simulations to calculate a single gravitational waveform (GW) can take several weeks. Yet, thousands of such simulations are needed for the detection and interpretation of gravitational waves. Future detectors will require even more accurate waveforms than those currently used. We present here the first large scale, adaptive mesh, multi-GPU numerical relativity (NR) code together with performance analysis and benchmarking. While comparisons are difficult to make, our GPU extension of the Dendro-GR NR code achieves a 6x speedup over existing state-of-the-art codes. We achieve 800 GFlops/s on a single NVIDIA A100 GPU with an overall 2.5x speedup over a two-socket, 128-core AMD EPYC 7763 CPU node with an equivalent CPU implementation. We present detailed performance analyses, parallel scalability results, and accuracy assessments for GWs computed for mass ratios q=1,2,4. We also present strong scalability up to 8 A100s and weak scaling up to 229,376 ×86 cores on the Texas Advanced Computing Center's Frontera system.
more » « less
Full Text Available
Scalable adaptive PDE solvers in arbitrary domains

https://doi.org/10.1145/3458817.3476220

Saurabh, Kumar; Ishii, Masado; Fernando, Milinda; Gao, Boshun; Tan, Kendrick; Hsu, Ming-Chen; Krishnamurthy, Adarsh; Sundar, Hari; Ganapathysubramanian, Baskar (November 2021, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis)

Full Text Available
Scalable Adaptive PDE Solvers in Arbitrary Domains

Kumar, Saurabh; Ishii, Masado; Fernando, Milinda; Gao, Boshun; Tan, Kendrick; Hsu, Ming-Chen; Krishnamurthy, Adarsh; Sundar, Hari; Ganapathysubramanian, Baskar (November 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis)

Efficiently and accurately simulating partial differential equations (PDEs) in and around arbitrarily defined geometries, especially with high levels of adaptivity, has significant implications for different application domains. A key bottleneck in the above process is the fast construction of a ‘good’ adaptively-refined mesh. In this work, we present an efficient novel octree-based adaptive discretization approach capable of carving out arbitrarily shaped void regions from the parent domain: an essential requirement for fluid simulations around complex objects. Carving out objects produces an incomplete octree. We develop efficient top-down and bottom-up traversal methods to perform finite element computations on incomplete octrees. We validate the framework by (a) showing appropriate convergence analysis and (b) computing the drag coefficient for flow past a sphere for a wide range of Reynolds numbers (0(1-10 6 )) encompassing the drag crisis regime. Finally, we deploy the framework on a realistic geometry on a current project to evaluate COVID-19 transmission risk in classrooms.
more » « less
Full Text Available
Industrial scale Large Eddy Simulations with adaptive octree meshes using immersogeometric analysis

https://doi.org/10.1016/j.camwa.2021.05.028

Saurabh, Kumar; Gao, Boshun; Fernando, Milinda; Xu, Songzhe; Khanwale, Makrand A.; Khara, Biswajit; Hsu, Ming-Chen; Krishnamurthy, Adarsh; Sundar, Hari; Ganapathysubramanian, Baskar (September 2021, Computers & Mathematics with Applications)
null (Ed.)
Full Text Available
An octree-based immersogeometric approach for modeling inertial migration of particles in channels

https://doi.org/10.1016/j.compfluid.2020.104764

Xu, Songzhe; Gao, Boshun; Lofquist, Alec; Fernando, Milinda; Hsu, Ming-Chen; Sundar, Hari; Ganapathysubramanian, Baskar (January 2021, Computers & Fluids)

Full Text Available
Solving PDEs in space-time: 4D tree-based adaptivity, mesh-free and matrix-free approaches

https://doi.org/10.1145/3295500.3356198

Ishii, Masado; Fernando, Milinda; Saurabh, Kumar; Khara, Biswajit; Ganapathysubramanian, Baskar; Sundar, Hari (November 2019, SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis)

Numerically solving partial differential equations (PDEs) remains a compelling application of supercomputing resources. The next generation of computing resources - exhibiting increased parallelism and deep memory hierarchies - provide an opportunity to rethink how to solve PDEs, especially time dependent PDEs. Here, we consider time as an additional dimension and simultaneously solve for the unknown in large blocks of time (i.e. in 4D space-time), instead of the standard approach of sequential time-stepping. We discretize the 4D space-time domain using a mesh-free kD tree construction that enables good parallel performance as well as on-the-fly construction of adaptive 4D meshes. To best use the 4D space-time mesh adaptivity, we invoke concepts from PDE analysis to establish rigorous a posteriori error estimates for a general class of PDEs. We solve canonical linear as well as non-linear PDEs (heat diffusion, advection-diffusion, and Allen-Cahn) in space-time, and illustrate the following advantages: (a) sustained scaling behavior across a larger processor count compared to sequential time-stepping approaches, (b) the ability to capture "localized" behavior in space and time using the adaptive space-time mesh, and (c) removal of any time-stepping constraints like the Courant-Friedrichs-Lewy (CFL) condition, as well as the ability to utilize spatially varying time-steps. We believe that the algorithmic and mathematical developments along with efficient deployment on modern architectures shown in this work constitute an important step towards improving the scalability of PDE solvers on the next generation of supercomputers.
more » « less
Full Text Available
A scalable framework for adaptive computational general relativity on heterogeneous clusters

https://doi.org/10.1145/3330345.3330346

Fernando, Milinda; Neilsen, David; Hirschmann, Eric W.; Sundar, Hari (January 2019, Proceedings of the ACM International Conference on Supercomputing ICS'19)

We present a portable and highly-scalable framework that targets problems in the astrophysics and numerical relativity communities. This framework combines together the parallel Dendro octree with wavelet adaptive multiresolution and an automatic code-generation physics module to solve the Einstein equations of general relativity in the BSSNOK formulation. The goal of this work is to perform advanced, massively parallel numerical simulations of binary black hole and neutron star mergers, including Intermediate Mass Ratio Inspirals (IMRIs) of binary black holes with mass ratios on the order of 100:1. These studies will be used to study waveforms for use in LIGO data analysis and to calibrate approximate methods for generating gravitational waveforms. The key contribution of this work is the development of automatic code generators for computational relativity supporting SIMD vectorization, OpenMP, and CUDA combined with efficient distributed memory adaptive data-structures. These have enabled the development of efficient codes that demonstrate excellent weak scalability up to 131K cores on ORNL's Titan for binary mergers for mass ratios up to 100.
more » « less
Full Text Available

« Prev Next »

Search for: All records